Search CORE

Hal-Diderot

Sampling-based optimization with mixtures

Author: Bardenet R.
Kégl Balázs
Publication venue: HAL CCSD
Publication date: 12/12/2009
Field of study

Sampling-based Evolutionary Algorithms (EA) are of great use when dealing with a highly non-convex and/or noisy optimization task, which is the kind of task we often have to solve in Machine Learning. Two derivative-free examples of such methods are Estimation of Distribution Algorithms (EDA) and techniques based on the Cross-Entropy Method (CEM). One of the main problems these algorithms have to solve is ﬁnding a good surrogate model for the normalized target function, that is, a model which has sufﬁcient complexity to ﬁt this target function, but which keeps the computations simple enough. Gaussian mixture models have been applied in practice with great success, but most of these approaches lacked a solid theoretical founding. In this paper we describe a sound mathematical justiﬁcation for Gaussian mixture surrogate models, more precisely we propose a proper derivation of an EDA/CEM algorithm with mixture updates using Expectation Maximization techniques. It will appear that this algorithm resembles the recent Population MCMC schemes, thus reinforcing the link between Monte- Carlo integration methods and sampling-based optimization. We will concentrate throughout this paper on continuous optimization

arXiv.org e-Print Archive

Adaptive MCMC with online relabeling

Author: Bardenet Rémi
Cappé Olivier
Fort Gersende
Kégl Balázs
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 27/07/2015
Field of study

When targeting a distribution that is artificially invariant under some permutations, Markov chain Monte Carlo (MCMC) algorithms face the label-switching problem, rendering marginal inference particularly cumbersome. Such a situation arises, for example, in the Bayesian analysis of finite mixture models. Adaptive MCMC algorithms such as adaptive Metropolis (AM), which self-calibrates its proposal distribution using an online estimate of the covariance matrix of the target, are no exception. To address the label-switching issue, relabeling algorithms associate a permutation to each MCMC sample, trying to obtain reasonable marginals. In the case of adaptive Metropolis (Bernoulli 7 (2001) 223-242), an online relabeling strategy is required. This paper is devoted to the AMOR algorithm, a provably consistent variant of AM that can cope with the label-switching problem. The idea is to nest relabeling steps within the MCMC algorithm based on the estimation of a single covariance matrix that is used both for adapting the covariance of the proposal distribution in the Metropolis algorithm step and for online relabeling. We compare the behavior of AMOR to similar relabeling methods. In the case of compactly supported target distributions, we prove a strong law of large numbers for AMOR and its ergodicity. These are the first results on the consistency of an online relabeling algorithm to our knowledge. The proof underlines latent relations between relabeling and vector quantization.Comment: Published at http://dx.doi.org/10.3150/13-BEJ578 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

Bandit-Aided Boosting

Author: Busa-Fekete Róbert
Kégl Balázs
Publication venue: HAL CCSD
Publication date: 12/12/2009
Field of study

In this paper we apply multi-armed bandits (MABs) to accelerate ADABOOST. ADABOOST constructs a strong classiﬁer in a stepwise fashion by selecting simple base classiﬁers and using their weighted "vote" to determine the ﬁnal classiﬁcation. We model this stepwise base classiﬁer selection as a sequential decision problem, and optimize it with MABs. Each arm represent a subset of the base classiﬁer set. The MAB gradually learns the "utility" of the subsets, and selects one of the subsets in each iteration. ADABOOST then searches only this subset instead of optimizing the base classiﬁer over the whole space. The reward is deﬁned as a function of the accuracy of the base classiﬁer. We investigate how the MAB algorithms (UCB, UCT) can be applied in the case of boosted stumps, trees, and products of base classiﬁers. On benchmark datasets, our bandit-based approach achieves only slightly worse test errors than the standard boosted learners for a computational cost that is an order of magnitude smaller than with standard ADABOOST

Relabelling MCMC Algorithms in Bayesian Mixture Learning

Author: Bardenet R.
Fort Gersende
Kégl Balázs
Publication venue: HAL CCSD
Publication date: 13/04/2011
Field of study

Predicting Bounds on Queuing Delay in the EGEE grid

Author: Germain-Renaud Cecile
Kégl Balázs
Perez Julien
Publication venue: HAL CCSD
Publication date: 01/05/2007
Field of study

International audiencePredicting the performance of schedulers is a notoriously difficult task. As a consequence, grid users might be tempted to work around the standard grid middleware by designing specific strategies, which would be counterproductive if generally adopted. On the other hand, Machine Learning has been successfully applied to performance prediction in distributed and shared environments. This paper reports on experiments on predicting the basic parameters of scheduling in the EGEE framework

MDDAG: learning deep decision DAGs in a Markov decision process setup

Author: Benbouzid D.
Busa-Fekete Róbert
Kégl Balázs
Publication venue: HAL CCSD
Publication date: 01/12/2011
Field of study

In this paper we propose an algorithm that builds sparse decision DAGs (directed acyclic graphs) out of a list of features or base classifiers. The basic idea is to cast the DAG design task as a Markov decision process. Each instance can decide to use or to skip each base classifier, based on the current state of the classifier being built. The result is a sparse decision DAG where the base classifiers are selected in a data-dependent way. The development of algorithm was directly motivated by improving the traditional cascade design in applications where the computational requirements of classifying a test instance are as important as the performance of the classifier itself. Beside outperforming classical cascade designs on benchmark data sets, the algorithm also produces interesting deep structures where similar input data follows the same path in the DAG, and subpaths of increasing length represent features of increasing complexity

Hal-Diderot

Fast classification using sparse decision DAGs

Author: Benbouzid D.
Busa-Fekete Róbert
Kégl Balázs
Publication venue: Omnipress
Publication date: 01/01/2012
Field of study

ISBN: 978-1-4503-1285-1International audienceIn this paper we propose an algorithm that builds sparse decision DAGs (directed acyclic graphs) out of a list of base classifiers provided by an external learning method such as AdaBoost. The basic idea is to cast the DAG design task as a Markov decision process. Each instance can decide to use or to skip each base classifier, based on the current state of the classifier being built. The result is a sparse decision DAG where the base classifiers are selected in a data-dependent way. The method has a single hyperparameter with a clear semantics of controlling the accuracy/speed trade-off. The algorithm is competitive with state-of-the-art cascade detectors on three object-detection benchmarks, and it clearly outperforms them in the regime of low number of base classifiers. Unlike cascades, it is also readily applicable for multi-class classification. Using the multi-class setup, we show on a benchmark web page ranking data set that we can significantly improve the decision speed without harming the performance of the ranker

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot